Skip to content

Fix refer model with RL#1660

Merged
hhaAndroid merged 1 commit intoInternLM:mainfrom
hhaAndroid:fix_ref_model_rl
Apr 9, 2026
Merged

Fix refer model with RL#1660
hhaAndroid merged 1 commit intoInternLM:mainfrom
hhaAndroid:fix_ref_model_rl

Conversation

@hhaAndroid
Copy link
Copy Markdown
Collaborator

No description provided.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude review

@hhaAndroid hhaAndroid changed the title fix refer model with RL Fix refer model with RL Apr 8, 2026
Comment thread xtuner/v1/rl/base/worker.py
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 8, 2026

Claude: ## Summary

Clean fix that aligns compute_ref_logprobs with the existing compute_actor_logprobs pattern by delegating logprob computation to LogProbContext instead of manually calling gather_logprobs on full logits. This is semantically equivalent, but gains chunked-mode memory efficiency and reduces code duplication.

Issues

Nit

  • [worker.py:393] Minor naming: local loss_ctx could be ref_loss_ctx or logprob_ctx for clarity vs. the dict key.

Process

  • PR body is empty — per CLAUDE.md, bug fix PRs should include Root Cause, Fix, and Test Plan sections. Even for a straightforward change like this, documenting why the old code was wrong (e.g. OOM on large vocabs? inconsistency with actor path?) helps future readers.
  • No regression test — CLAUDE.md requires bug fix PRs to include a test that reproduces the original bug.

Verdict

APPROVE — the code change itself is correct and well-motivated.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude please approve it

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change is correct — aligns compute_ref_logprobs with the existing compute_actor_logprobs pattern by delegating logprob computation to LogProbContext.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 8, 2026

Claude: Approved the PR as requested. The code change correctly aligns compute_ref_logprobs with the existing compute_actor_logprobs pattern.

@hhaAndroid hhaAndroid merged commit c3c53d8 into InternLM:main Apr 9, 2026
10 of 11 checks passed
RangiLyu pushed a commit to RangiLyu/xtuner that referenced this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant